Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Link to external docsets (mainly apple) when parsing a usr in a declaration. Additionally, creates a "link" to a usr for any type, function, etc. mentioned in a doc comment. #537

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

galli-leo
Copy link

@galli-leo galli-leo commented Jun 19, 2018

This is very WIP and will probably change quite a bit. I decided to create this pr, since @johnfairh created a commit sketching out the implementation for jazzy :P (though this is gonna be quite different from that). And maybe someone can point some stuff out for this pr :)

Overview

USRResolver

The most important new addition is the USRResolver class. Currently, it can "only" resolve apple usrs to their online documentation url and "create" usrs from simple types, such as UIImage or MyClass.Test.

(Note: one test case of this, is actually currently hardcoded just for testing purposes in the documentation command for the "Test" class. However, using cursorinfo already works quite well, and most types can already be resolved.)

Main idea behind USRResolver

It has two main purposes:

  1. Resolving usrs in external urls

  2. Resolving dot notation (e.g. Class.function) into a usr, so other programs can link to Types, etc. in the same module (mostly jazzy for now).

The idea is to add a USRLINK tag to all documentation comments as well, once the usr resolving is fully implemented. This would act as a „secondary“ pass on the docs command. I.e. after all documentation is gathered, we could then parse all doc comments for any „dot notation“ and add usrlinks.
For this to work, my current idea is, to "register" any usrs encountered while documenting a module into an "index".

Additionally, I would like to add another command for parsing standalone "dot notation", found in jazzy (for e.g. external markdown) and returning the usr.

To allow resolving any dot notation, it would be beneficial to write this "index" to a file, so that we can still query usrs with later commands (e.g. when using jazzy with other markdown files). IMO this would not only increase the usefulness of the standalone sourcekitten application, but also fill a hole created by the removal of docsetutils. (This is true for most of this pr, we can also add a standalone command for converting usrs to external doc urls).

Of course all of this is optional, e.g. the linked usrs are in a new property and saving the index file should be a new parameter.

Generally, I think this PR would help other projects besides jazzy as well. SourceKittenDaemon for example, could add something similar to quickhelp with this and have automatic linking to other types from both apple and the developer.

Questions

  • How should the index look like? My current implementation is a linked list. I.e. every element on the list can also be resolved to a usr, something like this:
//Code
public class Test {
  public func test(first:String) {
     print("Hello")
  }
}

//SourceKitten Index, just some sample data
[
  {
    "name":"Test",
    "usr":"s:TestFramework20Test",
    "type":"class",
    "children:" [
      "s:TestFramework20Test16testSs"
    ]
  },
 {
    "name":"test(first:)",
    "usr":"s: TestFramework20Test16testSs",
    "type":"function",
    "parents:" [
      "s: TestFramework20Test"
    ]
  }
]

If a person now queries Test.test (or someone writes that in a doc comment). USRResolver would first search for anything named Test. Then search if any of it's children are called test. If multiple possible matches are returned, parameters would be looked at. If we can narrow it down to one function, property, etc. we return the corresponding usr.

  • Should it be bottom up or top down search? i.e. should we first search for the function or type when querying the usr for Test.test(_:)? Currently I follow jazzy's implementation and do top down.

  • Should I reimplement sha1 (i.e. link to Security and use the low level apis) or try and get CryptoSwift to play nice on the build servers? Seems to be the reason for failing tests. Quite a heavy dependency for one function imo.

Examples:

Some sourcekitten outputs:

"key.parsed_annotated_decl" : "public func test(first: <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>, second: <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>, closure: @escaping ((<USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>, <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>) -> (<USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>, <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>))) throws -> <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK>"
"key.parsed_declaration" : "public func test(first: String, second: String, closure: @escaping ((String, String) -> (String, String))) throws -> String"

"key.parsed_annotated_decl" : "public override var description: <USRLINK usr=\"s:SS\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/string\">String<\/USRLINK> { get }"
"key.parsed_declaration" : "public override var description: String"

"key.parsed_annotated_decl" : "public class Test : Swift.<USRLINK usr=\"s:s23CustomStringConvertibleP\" url=\"https:\/\/developer.apple.com\/documentation\/swift\/customstringconvertible\">CustomStringConvertible<\/USRLINK>",
"key.parsed_declaration" : "public class Test : Swift.CustomStringConvertible",

Linked issues, etc.

@galli-leo galli-leo changed the title [WIP] Link to external docsets (mainly apple) when parsing a usr in a declaration (or doc comments). [WIP] Link to external docsets (mainly apple) when parsing a usr in a declaration. Additionally, creates a "link" to a usr for any type, function, etc. mentioned in a doc comment. Jun 24, 2018
@galli-leo
Copy link
Author

galli-leo commented Jun 24, 2018

Update: Finished the first implementation of the "index" and source kitten now auto resolves any usrs inside doc comments. For a sample output, see https://gist.github.com/galli-leo/169a8a65bc504fd8d4db78ca7ecdfdf8.

The code is really messy atm, since we first need to "register" all swift docs and then "parse" the doc comments for them. Any suggestions on how to improve this / incorporate the same thing for objc?

In addition to the way jazzy currently handles dot notation, I also made _: match any parameter name. From the looks of it, this does not currently work in jazzy.

Update 2: Integrated this into my jazzy branch. A live demo of this feature can be found here: http://galli.me/jazzy-demo/index.html. I created some pretty hard cases (imo), e.g. Enum inside a function. And they all seem to work fine. The new implementation into jazzy is still really messy, but for starters it should give a good idea how one can implement the new fields from this pr.

@jpsim
Copy link
Owner

jpsim commented Jun 24, 2018

This is both very promising and kinda scary 😁

@galli-leo
Copy link
Author

@jpsim Haha yeah, I can understand that. Especially since rebuilding jazzy's autolinking is maybe a bit overtop 😬.

I know the pr is currently wip, but could you take a quick look and give some feedback on the general implementation? If it's mostly good, I can start working on finalizing and adding tests.

Btw. I noticed that the libclang shipped with Xcode is pretty outdated. Some functions (e.g. clang_getCursorPrettyPrinted) aren't implemented there yet. Not sure if that's relevant to SourceKitten though.

@jpsim
Copy link
Owner

jpsim commented Jun 24, 2018

I know the pr is currently wip, but could you take a quick look and give some feedback on the general implementation? If it's mostly good, I can start working on finalizing and adding tests.

I don't even know where to start, to be honest. +3,014 −771 and all CI jobs failing is too much for me to dive in.

Btw. I noticed that the libclang shipped with Xcode is pretty outdated. Some functions (e.g. clang_getCursorPrettyPrinted) aren't implemented there yet. Not sure if that's relevant to SourceKitten though.

The parts of libclang used by SourceKitten have been in there for a long time, so it doesn't affect us currently.

@galli-leo
Copy link
Author

@jpsim Oh yeah, didn't even notice that. Though almost all of that's from the Xcode project. Not sure why, I have very little experience with SPM. I can try to revert the xcodeproj changes.

As for the failing tests, what would you recommend to do with CryptoSwift? Drop it and handle sha1 on our own or try to fix the tests?

@jpsim
Copy link
Owner

jpsim commented Jun 24, 2018

Oh yeah, didn't even notice that. Though almost all of that's from the Xcode project. Not sure why, I have very little experience with SPM. I can try to revert the xcodeproj changes.

Ok, yeah looking more closely I see that. That's ok, I can ignore those for now. I was mostly taken aback by the debug logs that are still in the implementation.

As for the failing tests, what would you recommend to do with CryptoSwift? Drop it and handle sha1 on our own or try to fix the tests?

I'm not opposed to depending on CryptoSwift, but sha1 does seem relatively easy enough to integrate.


In any case, moving forward, yes cleaning up the changes here and getting SourceKitten to build again would be a good start, removing the debug logs and adding tests, showing Jazzy's integration specs built with this change to very concretely see the impact this has, all those things will make it easier for maintainers to dive in and review what you've built here.

It's very promising 😄

@galli-leo
Copy link
Author

@jpsim How exactly does the Makefile work? i.e. how is the Xcode workspace used? Is it generated? Because tests seem to fail on Xcode, due to the workspace not containing the latest files.

@galli-leo
Copy link
Author

Ok I updated the integration specs and uploaded them here: http://galli.me/jazzy-demo/integration_specs/

It seems to be working quite well already!

A few things to note:

  • Objc does not work yet, need to find a good solution. (Worst case we just use the jazzy autolinking again)
  • Not the best match is chosen when finding a usr with dot notation (e.g. Request links to DataRequest).(This will be addressed in the future).
  • Find using cursorinfo is not yet used, since we don't save the compiler args. Will be resolved in the future. (This should help with wrongly linking Data to something completely different).
  • Guides are not yet autolinked. For this we need to add a new sourcekitten command and add the ability to write out the index to file.

@galli-leo
Copy link
Author

The matches chosen are now much better, however could still be improved. I have updated all integration fixtures under http://galli.me/jazzy-demo/integration_specs/

What do you think is more important (i.e. should be considered first): Linking to the closest match in name (e.g. linking test to Enum.test but not Class.test()) or linking to the "closest" match in terms of hierarchy and code (e.g. link test in a doc comment on the Class class to Class.test() instead of Enum.test)?

Also, do you have some tips on the failing tests? I cannot seem to find how the xcworkspace used is generated.

@galli-leo
Copy link
Author

@jpsim So after looking through the documentation command code, I think we should refactor the whole documentation generation a bit. This would greatly simplify the USRResolver.register and resolve parts and would prevent code duplication as well as allows the framework to be more easily integrated into other applications.

My current idea would be, to move the documentation generation out of the Module class. Instead the module class would only serve to gather the compiler args & files to document. These would then be passed to SwiftDocs which are actually creating the docs then.

Furthermore, I would also propose, to either merge ClangTranslationUnit into SwiftDocs or create a common parent class.

What do you think? I know this is probably out of scope for this pr, but it would greatly help with implementing some things from here. Should I open another pr instead?

@galli-leo
Copy link
Author

So I took a look at the objective c part and found some good things.

We can call tokenize on the cursor for a declaration and then iterate through all tokens. This way we can check if something is a reference, if yes get the usr referenced and save the range. Finally we can pull the declaration from the source file using the extent of the cursor and add the tags using the extracted ranges.

However, I have also found some caveats:

  1. The tokenize functions require a pointer to a pointer (i.e. a pointer to a c style array). This is really annoying and somewhat unsafe to do in Swift. Do you think that function should be wrapped in C?
  2. Unfortunately for class and category declaration the extent of a cursor includes everything in the class (comments, functions, etc.). To alleviate this, I propose we strip the declaration parsed from the doc comment (like it's done until now) and use that as a guide for where to stop. What do you think?

@galli-leo
Copy link
Author

@jpsim What do you think re: the above questions? I cannot move forward without an answer to them :/ (Sorry if it's a long wall of text, I could try cleaning up the questions if that makes it easier for you).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants